Introduction

Get an overview of the contents and understand the structure of this section.

We'll cover the following

In the early hours of the morning as you are sleeping in bed, your phone starts to ring. It's not the normal ring that you've set for friends and family but the red-alert ring you set for emergencies. As you are startled awake by the noise, you begin to come to your senses. You think of the recent release of your company's application. A sense of dread fills you as you pick up the call to be greeted by the automated voice on the other end, informing you that you've been requested to join a priority video conference with a team debugging a live site problem with the new release. You get out of bed quickly and join the call.

Once you are on the call, you are greeted by the on-call triage team. The triage team informs you that the application is experiencing a service outage affecting one of your largest customers, which represents a substantial portion of your company’s revenue. This outage has been escalated by the customer to the highest levels of your company. Even your CEO is aware of the outage. The triage team is unable to determine the cause of the downtime and has called you in to help mitigate the issue and determine the root cause of the outage.

svg viewer

You go to work to determine the root cause. You open your administrative dashboard for the application but find no information about the application. There are no logs, no traces, and no metrics. The application is not emitting telemetry to help you to debug the outage. You are effectively blind to the runtime behavior of the application and what is causing the outage. A feeling of overwhelming terror fills you as you fear this could be the end of your company if you are unable to determine what is causing the outage.

What we've just described is a reoccurring nightmare of an outage and not having the information needed to determine the runtime state of the application.

Without being able to introspect the runtime state of our application, we are effectively blind to what may be causing abnormal behaviors in the application. We are unable to diagnose and quickly mitigate issues. It is a profoundly helpless and terrifying position to be in during an outage.

Observability is the ability to measure the internal state of an application by measuring outputs from that application and infrastructure. We will focus on three outputs from an application: logs, traces, and metrics. In this section, we will learn how to instrument, generate, collect, and export telemetry data so that we will never find ourselves in a situation where we do not have insight into the runtime behavior of our application. We will use OpenTelemetry SDKs to instrument a Go client and server so that the application will emit telemetry to the OpenTelemetry Collector service. The OpenTelemetry Collector service will transform and export that telemetry data to backend systems to enable visualization, analysis, and alerting.

svg viewer

Let's get started by learning about OpenTelemetry, its components, and how OpenTelemetry can enable a vendor-agnostic approach to observability. The code used in this section is derived from here with some changes made to provide additional clarity.

Structure#

We will cover the following topics in this section:

Summary and Quiz on Observability With OpenTelemetry
Alerting on Metrics Abnormalities
Instrumenting for Metrics
Instrumenting for Distributed Tracing
Logging With Context
An Introduction to OpenTelemetry
Observability With OpenTelemetry
Section structure

Summary and Quiz on Automating Command-Line Tasks

An Overview of OpenTelemetry